Introduction

Though unintuitive at first, R offers a powerful way to visualize large data sets. With the right package, you can make charts and maps in R that look equally (if not better) than those you can make in ArcGIS or Illustrator. Most importantly, R is free and open-source and hence allows much smaller organisations (who cannot afford expensive ArcGIS or Adobe license) to leverage the power of GIS. More on ArcGIS and ESRI’s monopoly of GIS here. Whenever possible we should challenge the software monopoly of big corporations because:

  • It is bad for free market competition—without spending too much time on this, ArcGIS has many flaws as a spatial analysis product that it is not motivated to fix because there is no competition.

  • It silos public data in the hands of private corporations, creating a dependency on one organization. Why should ESRI get to monitise data collected by public entities using public tax dollars?

  • In this document, I hope to show you the basics of data visualizations in R so that you can keep making maps and graphs even after we loose educational ArcGIS and Adobe licenses.

Basics of ggplot

ggplot is one of the most popular R data visualization packages. ggplot stands for “Grammar of Graphics” and is a way to use language (rather than point and click) to create charts. STHDA has a really good introduction to it here, but the basic of ggplot is to view it as a collection of layers.

According to ggplot2 concept, a plot can be divided into different fundamental parts : Plot = data + Aesthetics + Geometry.

The principal components of every plot can be defined as follow:

data is a data frame

Aesthetics is used to indicate x and y variables. It can also be used to control the color, the size or the shape of points, the height of bars, etc…..

Geometry corresponds to the type of graphics (histogram, box plot, line plot, density plot, dot plot, ….)

You really can do a lot with it! Lets go through the steps of creating a simple bar chart in ggplot.

First, lets set up our environment and call our data. When setting up your data for plotting, remember that the column names appear as the axis labels. Hence, when I set up my pivoted data, I try to use column names that would make the most sense.

#start by installing the packages
if(!require(pacman)){install.packages("pacman"); library(pacman)}
## Loading required package: pacman
p_load(tidyverse, ggplot2, sf, tigris, tidycensus)

#call your variables, here I am using commuting statistics from the ACS

commute_vars <- c(
  total_workers = "B08006_001",
  drive_alone = "B08006_003",
  carpool = "B08006_004",
  public_transit = "B08006_008",
  walked = "B08006_015",
  bicycle = "B08006_014",
  worked_at_home = "B08006_017")

philly_commute.2022 <- get_acs(
  geography = "county",
  variables = commute_vars,
  state = "PA",
  county = "Philadelphia",
  year = 2022,
  survey = "acs5",
  output = "wide")%>%
    mutate( #create percentage classes for mapping
    Drive = drive_aloneE/total_workersE * 100,
    Carpool = carpoolE/total_workersE * 100,
    Transit = public_transitE/total_workersE * 100,
    Walk = walkedE/total_workersE * 100, 
    Bicycle = bicycleE/total_workersE * 100,
    WFH = worked_at_homeE/total_workersE * 100
  )%>%
  select(Drive, Carpool, Transit, Walk, Bicycle, WFH) %>%
  pivot_longer( #pivot to a cleaner look for plotting
    cols = everything(),
    names_to = "mode",
    values_to = "percentage"
  )
## Getting data from the 2018-2022 5-year ACS

Now that our data is set up, let us begin by creating a rough plot. ggplot works by layering objects on top of one another, so we start by specifying the data layer (here being bars of the bar chart).

R has more than 600 named colours that it recognizes. A list of all such colours can be found here.

Now that we have our base chart, we can begin our customizations. Lets start by geom_label (or geom_text) which adds text labels to our chart objects.

plot1+
  geom_text(#sets the text layer on top of the plot
    aes(label = paste0((round(percentage, 1)), "%")),#here, we want the y label to be percentage
    #we use round to make it look pretty
    nudge_y = 3, #we want the text to go up by 4 pixels
  )

We have our text labels! Stylistically I prefer using the geom_label function instead of geom_text as it renders labels with a filled background. So lets replace that.

plot1 <- plot1 +
  geom_label(
    aes(label = paste0((round(percentage,1)), "%")), 
    size = 4, # how big we want it to be
    nudge_y = 4, # move y by 4 px
    fill = "#F6AE2D", #this sets the fill color
    color = "white", #this sets the font color
    alpha = 0.9) #make the label transparent

plot1

Now, lets add more text to our plot, i.e. change the labels using the labs() command.

plot1 <- plot1 +
    labs( #set the labels for the chart
    title = "How Philadelphia Commutes to Work",
    subtitle = "Percentage by Commute Choice",
    caption = "Data: ACS 2022",
    x = NULL,  # Remove x and y axis label since it is self explanatory
    y = NULL
  ) 

plot1

The next thing you will need to do is specify the themes, which are other stylistic elements of the chart. Themes are where you would adjust other elements of the chart such as gridlines, text positions etc. You can choose between preset R themes, specify your own thing, or do both! There are a lot of themes you can choose from.

plot1 +
  theme_linedraw()

plot1 + 
  theme_classic()

Personally, I prefer theme_minimal, so that is what we will use for the plot.

plot1 <- plot1 +
  theme_minimal()

plot1

I prefer using the minimal theme alongside customizing other elements (such as fonts) using the theme function. Using the packages showtext and sysfonts, you can import google fonts or any other system fonts downloaded on your device.

#first, lets call in our packages
p_load(showtext,sysfonts)

#now, lets call in our desired font from google fonts
#here I want to call IBM Plex Sans from google font

font_add_google("IBM Plex Sans") #calls the font
showtext_auto() #initializes it in the environment

plot1 + 
    theme(
    text = element_text(family = "IBM Plex Sans", color = "#2F4858"),
    #sets style for all text
    plot.title = element_text(size = 16, face = "bold"), #adjust formatting of title
    plot.subtitle = element_text(size = 12), 
    axis.text = element_text(size = 11, family = "IBM Plex Sans", color = "#2F4858"))

The theme command gives you quite a lot of freedom to change the elements in your plot! I would encourage you to explore it further. For now, I want to adjust my gridlines and other plot elements.

plot1 <- plot1 +
    theme_minimal() + #sets a larger theme, google ggplot themes to see the options
  theme(
    text = element_text(family = "IBM Plex Sans", color = "#2F4858"), # set style for
    #all text
    plot.title = element_text(size = 16, face = "bold"), #adjust formatting of title
    plot.subtitle = element_text(size = 12), 
    axis.text = element_text(size = 11, family = "IBM Plex Sans", color = "#2F4858"),
    #set text style for axis text (which is different than plot text, idk why)
    panel.grid.minor = element_blank(), #get rid of minor gridlines
    plot.margin = margin(10, 10, 10, 10), # add padding
    plot.caption = element_text(size = 8, margin = margin(t = 10)), #pad down caption
    panel.grid.major.y = element_line(color = "#F6AE2D", size = 0.2, 
                                      linetype = "dotted"), 
    #have y grids
    panel.grid.major.x = element_blank() #and get rid of x axis grids
  )
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
plot1

And there is my plot! I can now export this chart to put in any document I see fit using the ggsave function. You can export it as a jpeg, a png, a pdf, or a svg to further edit in other software!

ggsave(plot = plot1,
       "commuteplot1.png", #change the .png to .svg or .pdf as needed
       bg = "transparent", #this will create a png with a transparent background
       #remove if you want the fill color as you set it 
       width = 10, #width of the image
       height = 5) #height of the image

A note of caution is that your saved plot might end up looking very different than your plot output in quarto. This happens because quarto will compress image outputs. Hence, you may need to play around with sizes of fonts and such in order to make your final plot look good.

Great, I hope you now understand what we mean when we say ggplot layers—its not too different than illustrator layers! Think of ggplot commands as steps you would need to do in illustrator—only here instead of mouse clicks, you memorize the steps instead. Lets move on to another example. As before, I call in my data and arrange it in the correct structure.

plot2_vars <- c(
  total_workers = "B08006_001",
  public_transit = "B08006_008",
  medhhinc = "B19013_001"
)

plot2_data <- get_acs(
  geography = "tract",
  variables = plot2_vars,
  state = "PA",
  county = "Philadelphia",
  year = 2022,
  survey = "acs5",
  output = "wide")%>%
    mutate(
    Transit = public_transitE/total_workersE * 100,
    Income = medhhincE
  )%>%
  select(GEOID, Transit, Income)%>%
  filter(!is.na(Income))
## Getting data from the 2018-2022 5-year ACS

For this plot, I want to use a new google font called Jost. I will not show how to call system fonts in this document as it is slightly complicated and I don’t know how you would do it on a windows device. However, if you are interested, you can read about it here.

I also call in the package scales, which is a nifty little tool to get various metrics (percentage, money) formatted to look pretty in your plot.

font_add_google("Jost")
showtext_auto()

p_load(scales)

Here, I am plotting a scatter plot. The scales package is especially useful when I am setting the labels of my x axis. Here, I use it shorten dollar amounts to the first three numbers and add the suffix k behind it.

plot2 <- ggplot(plot2_data,
       aes(x = Income, 
           y = Transit))+
  geom_point(
    stat = "identity",
    colour = "dodgerblue4",
    alpha = 0.6,
    size = 1.5
  )+
  scale_x_continuous(
    labels = scales::label_dollar(scale = 0.001, suffix = "k"))+
  #here, the scale = 0.001 basically means to multiply the number by 0.001
  #recall decimals from elementary school! Multiplying 500,000 by 0.001 
  #will give us 500!
  #then, I add k at the back to tie it to together.
  scale_y_continuous(
    limits = c(0, max(plot2_data$Transit)))
# i also set the limit of y axis as the max percentage value of transit

plot2

The rest of this section is about setting the theme and customisations. Here, I set the fill of the plot to a different colour. However, this means that I won’t be able to export it as a transparent png.

plot2+ 
  theme_minimal()+
  theme(
    text = element_text(family = "Jost", color = "steelblue4"),
    axis.text = element_text(size = 11, family = "IBM Plex Sans", color = "steelblue4"),
    panel.grid.minor = element_blank(),
    axis.title.x = element_text(vjust = -4),
    axis.title.y = element_text(vjust = 4),
    legend.title = element_text(size = 12),
    plot.margin = margin(10, 10, 10, 10),
    plot.caption = element_text(size = 8, margin = margin(t = 10)),
    plot.title = element_text(size = 14, face = "bold"),
    plot.background = element_rect(fill = "seashell", color = NA), #set background fill
    panel.background = element_rect(fill = "seashell", color = NA)
    #here, I have to set the fill of plot and panel separately because in ggplot
    #they are two different things. Play around with changing the fill colour to see
    #how it works
  ) +
  labs(
    title = "Less People Take Transit as Census Tracts Get Richer",
    subtitle = "Median Income by Transit Commuters in Philadelphia",
    caption = "Data: ACS 2022",
    x = "Median Household Income",
    y = "% of Transit Commuters")

Getting Crazy with gganimate

Okay one last trick—you can also animate your graphs using ggplot! For this, let us call in the data on affordable housing production in Philadelphia from OpenDataPhilly. Here, rather than downloading the data manually, I tell R to download the data directly from the url. You can do this too—just right click the data you want, copy the clean link, and put it in place of my url. You may need to change the calling function for different file types—since I am downloading a geojson, I use the st_read which reads spatial files. If you were calling in a csv, you would want to use read.csv.

screenshot
screenshot
#load in the packages that we will need
p_load(gganimate)


affordablehousing <- st_read("https://hub.arcgis.com/api/v3/datasets/ca8944190b604b2aae7eff17c8dd9ef5_0/downloads/data?format=geojson&spatialRefId=4326&where=1%3D1")
## Reading layer `Affordable_Housing' from data source 
##   `https://hub.arcgis.com/api/v3/datasets/ca8944190b604b2aae7eff17c8dd9ef5_0/downloads/data?format=geojson&spatialRefId=4326&where=1%3D1' 
##   using driver `GeoJSON'
## replacing null geometries with empty geometries
## Simple feature collection with 443 features and 11 fields (with 25 geometries empty)
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -75.24604 ymin: 39.90183 xmax: -74.9829 ymax: 40.12171
## Geodetic CRS:  WGS 84
plot3data <- affordablehousing %>%
  mutate(PROJECT_TYPE = case_when(PROJECT_TYPE == 
                                    "Rental;Special Needs" ~ "Special Needs",
                                  TRUE ~ PROJECT_TYPE),
         #simplifying one catagory for plotting
         Year = as.numeric(FISCAL_YEAR_COMPLETE))%>%
  group_by(Year, PROJECT_TYPE)%>%
  summarize(total_units = sum(TOTAL_UNITS)) %>%
  ungroup()%>%
  filter(!is.na(Year) &
         !is.na(total_units))
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
#removing NA values

As before, we manipulate our data and get it ready to plot. We also follow similar steps as before to create our base plot, with two key command differences. In this chart, I used scale_fill_manual to manually set the colours of my categorial data.

plot3 <- ggplot(plot3data,
       aes(x = PROJECT_TYPE, 
           y = total_units,
           fill = PROJECT_TYPE)) +
  geom_col(position = position_dodge(width = 0.9)) +
  scale_fill_manual(values = c( #when you are working with catagorial data
  "Rental" = "#2a9d8f",
  "Special Needs" = "#e76f51",
  "Homeownership" = "#f4a261"))+
  geom_label(
    aes(label = total_units), 
    size = 4, 
    nudge_y = 4,
    fill = "#e9c46a",
    color = "white",
    alpha = 0.9) +
  theme_minimal()+
  theme(
    text = element_text(color = "steelblue4"),
    axis.text = element_text(size = 11, color = "steelblue4"),
    legend.position = "none",
    plot.title = element_text(size = 20, face = "bold"),
    plot.subtitle = element_text(size = 12),
    panel.grid.major.y = element_line(color = "steelblue4", size = 0.2,
                                      linetype = "dotted"), 
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank()) +
  labs(
    title = "What types of Affordable Housing Has Philly Been Building?",
    subtitle = "Year: {closest_state}", #here, subtitle will be the closest state
    #to the animated frame, more on states below
    caption = "Data: Open Data Philly",
    x = NULL,
    y = "Total Units"
  )

plot3

The above section however, just gives me the rough plot of my chart. If you print it, it looks very weird because all the unit values from various years are mapped on top of one another. To animate it, you would need to add the following sections.

plot3 <- plot3 +
  #the segments that allow you to animate
  transition_states(Year, #the variable which will define the frames of animation 
                    #or "states". Here, the gif moves between various Years in ach frame
                    transition_length = 30, #the length of each transition
                    state_length = 20) + #how long the state should last
  ease_aes('linear') #the style of animating

I will not go too much into the details of gganimate because it is vast and you really only get better at it through trial and error. You can find a really good cheatsheet here if you are curious to learn further.

In short, to animate, you will just need to specify the variable through which you want to transition (here defined as Year). Once you have your specifications, you can render your gif using the animate command.

#uncomment these sections when you run the markdown
#however, you will need to encode the gifs seperatly to knit them
#to add gifs just write this in your file ![caption](file/path/to/gif.gif)

#plot3animate <- animate(plot3, 
                        #nframes = 29, #no of frames
                        #fps = 0.8, #frame per second
                        #width = 800, 
                        #height = 600)

#anim_save("plot3.gif", plot3animate)
Affordable Housing Through Years
Affordable Housing Through Years

You really can do a lot with gganimate! Here is another example, adapted from the gganimate documentation page here. In this plot, we will be showing change in life expectancy over GDP per capital in various countries. The dataset to do so is already loaded in R.

#calling a pre-existing dataset in R
p_load(gapminder)

plot4 <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, colour = country)) +
  geom_point(alpha = 0.7, show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
scale_x_log10()+
  facet_wrap(~continent) +
  theme_minimal()+
  theme(
        text = element_text(color = "steelblue4"),
        axis.text = element_text(size = 11, color = "steelblue4"),
        panel.grid.major.y = element_line(color = "steelblue4", size = 0.2,
                                          linetype = "dotted"), 
        panel.grid.major.x = element_blank(),
        panel.grid.minor = element_blank(),
        axis.text.x = element_blank(),
  )+
  labs(title = "Does GDP per capita correlate with Life Expectancy",
    subtitle = 'Year: {frame_time}', 
       x = 'GDP per capita', 
       y = 'Life expectancy') +
  transition_time(year) +
  ease_aes('linear')

#uncomment these sections when you run the markdown
#however, you will need to encode the gifs seperately to knit them
#to add gifs just write this in your file ![caption](file/path/to/gif.gif)
#plot4animate <- animate(plot4, 
        #width = 800, 
        #height = 600)

#anim_save("plot4.gif", plot4animate)
GDP by Life Expectancy
GDP by Life Expectancy

Spatial Data with Tigris and Geom_Sf

Okay, now lets try to plot some maps! To start, lets plot a simple chloropleth using the transit data we called previously. The package Tigris allows us to easily call shapefiles of various US political subdivisions and is a powerful tool when plotting US data.

philly <- tigris::tracts(state = "PA", county = "Philadelphia", year = 2022)
plot(philly[4])

A cool thing about tigris is that it can help you automatically erase sections of the map that would be covered by water. I find that having water bodies erased gives you prettier maps. I also transform the plot into a coordinate reference system more suitable for mapping Philadelphia.

philly <- philly %>%
  tigris::erase_water(area_threshold = 0.8)%>%
  st_transform(2272)
plot(philly[4])

If you notice, now my map has gaps in it to account for water bodies. Another cool thing you can do with tigris is to call polygons of water bodies in an area. Again, I find that this makes for a prettier plot. So lets call the water of all the neighbouring counties and then crop the shapefile to the extent of Philadelphia.

philly_water <- tigris::area_water(state = "PA", county = "Philadelphia", year = 2022)%>%
  rbind(tigris::area_water(state = "PA", county = "Montgomery", year = 2022))%>%
  rbind(tigris::area_water(state = "PA", county = "Delaware", year = 2022))%>%
  rbind(tigris::area_water(state = "PA", county = "Bucks", year = 2022))%>%
  rbind(tigris::area_water(state = "NJ", county = "Burlington", year = 2022))%>%
  rbind(tigris::area_water(state = "NJ", county = "Camden", year = 2022))%>%
  rbind(tigris::area_water(state = "NJ", county = "Gloucester", year = 2022))%>%
  st_transform(2272)%>%
  st_crop(st_bbox(st_buffer(philly, dist = 5000))) #buffer so it is philly and a bit more
ggplot() +
  geom_sf(data = philly, fill = "seashell3", color = NA, alpha = 0.6)+
  geom_sf(data = philly_water, fill = "dodgerblue4", color = "dodgerblue", alpha = 0.6)

We can also call up the shapefiles of PA and NJ that are adjacent to Philly. We don’t need any gridlines or variables in our data, so I specify the theme to be theme_void().

pa <- tigris::counties(state = "PA", year = 2022)%>%
  st_transform(2272)%>%
  st_crop(st_bbox(st_buffer(philly, dist = 5000)))%>%
  erase_water(area_threshold = 0.9)

jersey <- tigris::states()%>%
  filter(STUSPS == "NJ")%>%
  st_transform(2272)%>%
  st_crop(st_bbox(st_buffer(philly, dist = 5000)))%>%
  erase_water(area_threshold = 0.8)
ggplot() +
  geom_sf(data = pa, fill = "seashell3", col = "white", alpha = 0.9)+
  geom_sf(data = jersey, fill = "seashell3", col = "white", alpha = 0.9)+
  geom_sf(data = philly_water, fill = "dodgerblue4", col = "dodgerblue", alpha = 0.8) +
  geom_sf(data = philly, fill = "seashell", color = NA, alpha = 0.6)+
  theme_void()

Lets plot our data now. To start, lets join it with our philly census tract map and then add it as a layer.

philly <- philly %>%
  left_join(plot2_data) %>%
  mutate(Transit = case_when(
    is.na(Transit) ~ 0,
    TRUE ~ Transit
  ))
## Joining with `by = join_by(GEOID)`
plot5 <- ggplot() +
  geom_sf(data = pa, fill = "seashell", col = "gray", alpha = 0.6)+
  geom_sf(data = jersey, fill = "seashell", col = "gray", alpha = 0.6)+
  geom_sf(data = philly_water, fill = "dodgerblue4", col = "dodgerblue", alpha = 0.8) +
  geom_sf(data = philly, 
          aes(fill = Transit),
          col = "transparent",
          linewidth = 0.1)+
  scale_fill_gradient( #helps set color for continuous variables
    name = "% of Transit Commuters",
    high = "maroon4",
    low = "mistyrose")+
    coord_sf( #specifies the limits of map as limits of Philly
    xlim = c(st_bbox(philly)[1], st_bbox(philly)[3]),
    ylim = c(st_bbox(philly)[2], st_bbox(philly)[4])
  )+
  theme_void()

plot5

Lets now work on our customisations.

plot5 +
  labs(
    title = "Where do Transit Commuters Live in Philadelphia",
    caption = "Data: ACS 2022"
  )+
  theme(
        text = element_text(family = "Jost", color = "steelblue4"),
        plot.title = element_text(face = "bold", hjust = 0.5, size = 12),
        legend.position = "bottom", #moves the legend
        legend.key.width = unit(1, "cm"), #specifies its size
        plot.margin = margin(10, 10, 10, 10),
        legend.title = element_text(vjust = 1, hjust = 0.1)) #adjusts text

Lets try plotting point data of affordable housing next. To start, we would need to do some data manipulation. Annoyingly, point data is difficult to plot in ggplot if the X and Y coordinates are not stored in separate columns. Hence, to start, we would transform our affordable housing’s crs, and create two new columns that subset the longitude and latitude in geometry.

affordablehousing <- st_transform(affordablehousing, 2272)%>%
  mutate(
    X = st_coordinates(geometry)[,1],
    Y = st_coordinates(geometry)[,2]
  )

The steps that we would follow is quite similar to what has been outlines previously, except the introduction of a new function (guides). Guides helps you override aesthetic styling in the legend and make the individual circles in the legend bigger.

ggplot() +
  geom_sf(data = pa, fill = "seashell3", col = "white", alpha = 0.9)+
  geom_sf(data = jersey, fill = "seashell3", col = "white", alpha = 0.9)+
  geom_sf(data = philly_water, fill = "dodgerblue4", col = "dodgerblue", alpha = 0.8) +
  geom_sf(data = philly, fill = "seashell", color = NA, alpha = 0.6)+
  geom_point(data = affordablehousing, 
          aes(x = X, y = Y, color = PROJECT_TYPE), #specify X and Y
          size = 0.8, alpha = 0.9)+
  scale_color_manual(values = c("Homeownership" = "#7fc97f", #create the color scale
                               "Rental" = "#beaed4",
                               "Special Needs" = "#fdc086"),
                    name = "Project Type") +
  theme_void() +
  labs(
    title = "Affordable Housing in Philadelphia"
  )+
  theme(
    text = element_text(family = "Jost", color = "steelblue4"),
    plot.title = element_text(face = "bold", hjust = 0.5, size = 12, vjust = 0.5),
    legend.title = element_text(hjust = 0.5),
    legend.position = "bottom",
    legend.key.size = unit(5, "mm"),
    plot.margin = margin(10, 10, 10, 10))+
  guides(color = guide_legend(override.aes = list(size = 3))) #override aesthetic styling
## Warning: Removed 25 rows containing missing values or values outside the scale range
## (`geom_point()`).

And there is our map!